Memory striping approach that interleaves sub protected data words

ABSTRACT

An apparatus is described. The apparatus includes a memory controller having logic circuitry to write a unit of write data into a plurality of memory chips according to a striping pattern that includes multiple protected sub words, each protected sub word including a smaller portion of the unit of write data and error correction coding (ECC) information calculated from the smaller portion of the unit of write data.

FIELD OF INVENTION

The field of invention pertains generally to the computing sciences,and, more specifically, to memory striping approach that interleaves subprotected data words.

BACKGROUND

FIG. 1a shows a pair of “8+2” memory channels 101, 102 each having eightmemory chips and two error correction code (ECC) chips. Both of thedepicted 8+2 memory channels conform to a Joint Electron DeviceEngineering Council (JEDEC) Dual Date Rate “5” (DDR5) memory“sub-channel” implementation. Here, each memory chip is an “X4” memorychip and nominal read or write bursts consist of 16 cycles to transfer512 bits (b)=64 bytes (B) of data and 128b of ECC that protects the data((16 cycles)×(8 data chips)×(4 bits/chip)=512b=64B).

The block of information having the unit of data to be transferred 103and the ECC information that protects the unit of data 104 can bereferred to as an “ECC protected word of data” 105 (or more simply, a“protected word of data” or “protected data word”).

If one of the ten memory chips within one of the channels (assumechannel 101) begins to fail, incorrect information will be present inone or more bit locations of a protected data word 105 where thecorresponding content is stored by the failing chip. In response, thehost (e.g., memory controller) processes the protected data word's data103 and ECC information 104 blocks to correct the incorrect informationand identify the failing chip.

Thus, the channel can continue to operate even though one of the memorychips is failing. However, if another (second) memory chip on the samechannel fails (such that two of the channel's ten memory chips isfailing), the incorrect information within the protected data word 105cannot be corrected.

As such, in response to a failed memory chip on a 8+2 memory channel,various computer systems are configured to switch-over (adapt) from thepair of 8+2 memory channels 101, 102 (a first 101 having the failed chipand a second 102 that does not have any failed chips) to a single 16+2configuration (e.g., adaptive double device data correction (ADDDC)).

That is, upon the failure of the first memory chip on the first 8+2channel 101, the failing memory chip is put out of use (retired) andprotected data words that used to be stored only on the first channel101 (pre-failure) are instead spread over the first and second channels101, 102 (post-failure). Likewise, protected data words that used to bestored only on the second channel 102 are also spread over the first andsecond channels 101, 102.

FIG. 1b shows the two 8+2 channels 101, 102 of FIG. 1a after beingre-configured to operate according to a 16+2 scheme. As observed in FIG.1b , the bad chip 106 of the first sub-channel is identified as bad andis not used. Eight of the remaining nine good memory chips of the firstchannel 101 are used for data, and, the last (ninth) good memory chip ofthe first channel 101 is used for ECC. The second channel is arrangedsimilarly (eight memory chips are used for data, one memory chip is usedfor ECC and one memory chip is a spare).

The resulting configuration is a 16+2 memory channel having sixteenmemory chips used for data and two memory chips used for ECC. Accordingto the 16+2 configuration, a protected data word 107 having 64B units ofdata 108 a, 108 b and 64 bits of corresponding ECC information 109 areread/written in eight cycles ((8 cycles)×(16 data chips)×(4bits/chip)=512 bits=64B) (If only one of the regions have a bad chip a16+3 configuration can be used to increase the ECC coverage or allow formore cache line bits to be used for other control functions (i.e. ascache-line meta bits)).

FIG. 1b shows two such protected data words 107, 110 being read/writtenin sequence over 16 cycles. Here, comparing the pre-fail 8+2configurations of FIG. 1a with the post-fail 16+2 configuration of FIG.1b , the expansion of the number of data memory chips from 8 to 16allows for less ECC information (128b to 64b) per protected word of data107, 110.

In order to effect 16+2 operation, the pair of channels 101, 102 operatein “lock-step” meaning the same address is used for the same cyclenumber across the two channels 101, 102. Here, a memory channel (whethera sub-channel or otherwise) includes a data bus and (e.g., ranks of)memory chips that are addressed with a same address value. Differentmemory channels in a same memory system, unless operating in lock-step,can concurrently address their respective memory chips with differentaddresses.

Although the channels 101, 102 can operate in lock-step simultaneously(the same cycle number exists at the same time for both channels), intheory, simultaneous execution is not a strict requirement (thedifferent channels 101, 102 can read/write their respective “halves” ofa protected data word at different absolute times).

FIGURES

A better understanding of the present invention can be obtained from thefollowing detailed description in conjunction with the followingdrawings, in which:

FIG. 1a shows a pair of 8+2 memory configurations;

FIG. 1b shows a 16+2 memory configuration;

FIG. 2 shows a first embodiment of an improved memory striping approach;

FIG. 3 shows a second embodiment of an improved memory stripingapproach;

FIG. 4 shows a third embodiment of an improved memory striping approach;

FIG. 5 shows memory channels coupled to a memory controller;

FIG. 6 shows a computing system;

FIG. 7 shows a data center;

FIG. 8 shows multiple racks.

DESCRIPTION

A problem with the switch-over from a pair of independently operating8+2 channels (FIG. 1a ) to a 16+2 configuration (FIG. 1b ) is that the“striping” as to which specific bits of which specific data or ECC fieldwithin a protected data word is written into which specific memory chipcompletely changes for all 20 memory chips of both channels 101, 102.

The drastic striping change results in extended down time or othermemory interruption in which, e.g., all the content must be read fromthe pair of 8+2 configured channels and re-written into the 16+2configuration according to the new 16+2 striping pattern. In essence,the switch-over includes a “blast radius” that affects the informationcontent of every memory chip region in the pair of channels 101, 102even though only one of the channels 101 has a bad memory chipregion.106.

A better approach is to confine the blast radius to the memory chips ofthe channel 101 having the failing chip region 106. So doing willinterrupt the 8+2 channel 101 having the bad chip, but all other 8+2channels (such as channel 102) will remain unaffected and uninterruptedby the memory chip failure.

FIG. 2 shows an embodiment of the improved approach. FIG. 2 shows thesingle channel 101 having the failing chip region being re-striped sothat the amount of ECC information per 64B unit of data is increased. Inthe particular embodiment of FIG. 2, as explained in more detail below,the data content 103 of a pre-failure protected data word 105 (FIG. 1a )is broken down into two smaller “protected sub words” 211, 212 eachhaving 32B of data and 128b of ECC. Here, the ratio of ECC to data ishigher in the protected sub words 211, 212 (128b:256b=1:2) than in thepre-failure protected word 105 (128b:512b=1:4) which allows the channel101 to recover corrupted data if another (second) memory chip in thechannel 101 fails.

Moreover, consistent with the blast radius characteristic, note that inthe prior art scenario described in the Background, the capacity of thechannel that is created in response to the chip failure (16+2=18 memorychips) is the same capacity of the channel that suffered the chipfailure (8+2=10 memory chips). By contrast, in the improved approach ofFIG. 2, the capacity of the channel that is created in response to thechip failure (8 memory chips) is smaller than the capacity of thechannel that suffered the chip failure (8+2=10 memory chips). Thus, there-striping that is responsive to a chip failure requires the softwareto operation with a smaller amount of physical memory. Software willneed to react appropriately to this new physical memory limitation.

As described immediately below, after re-striping, information istransferred across eight of the remaining nine memory chips (the ninthmemory chip is regarded as a spare and can be called into use if thechannel suffers a second memory chip failure). The information that istransferred over the eight memory chips is deemed by the host to beorganized into different blocks of data (D1, D2, D3 and D4) and ECCinformation (ECC1 and ECC2) that the host is able to organize/arrangeinto the pair of protected sub words (specifically, a first protectedsub word corresponds to D1+D2+ECC1 and a second protected sub wordcorresponds to D3+D4+ECC2). During a read, the host processes the subwords separately. That is, D1, D2 and ECC1 are processed together tocorrect any errors in D1 and D2, and, D3, D4 and ECC2 are processedtogether to correct any errors in D3 and D4.

After both sub words are corrected, the host then combines D1, D2, D3and D4 to form the original 64B unit of data. Here, e.g., the largercomputer accesses/addresses memory in data units of 64B because, e.g.,caches between the machine's processors and memory are organized into64B cache line slots. Thus, as far as the computer is concerned, memoryis still accessed/addressed in 64B data units. The memory channel 101,however, has been re-striped to protect against a second memory chipfailure by breaking a 64B unit of data into two separate protected wordseach having a data unit size of 32B.

As mentioned above, the increase in total ECC information per 64B unitof data provides sufficient protection to maintain error correction inthe event that a next (second) memory chip in the channel fails. Thisstands in contrast to the approach described in the Background where,upon switchover to a 16+2 scheme, the number of memory chips per 64B ofdata is expanded 108 a, 108 b to reduce the number of errors perprotected word which, in turn, allows for a reduction in the amount ofECC information per 64B unit of data (64b:64B=1:8).

Thus, whereas the prior art approach of switching over to a 16+2configuration reduces the ratio of ECC to data per protected data wordfrom the pre-failure 8+2 configuration (from 1:4 to 1:8), by contrast,the new approach of FIG. 2 increases the ratio of ECC to data perprotected data word (from 1:4 to 1:3).

As observed in FIG. 2, the increase in the ratio of ECC information perprotected data word is affected by re-striping into blocks D1 through D4and ECC1, ECC2 as described above, and, consuming more cycles pertransfer of 64B of data. That is, whereas the pre-fail memory channel101 of FIG. 1 consumes 16 cycles to transfer 64B of protected data 103,by contrast, the improved approach of FIG. 2 consumes 24 cycles (16+8).

In various implementations, half (chop) bursts of 8 cycles actuallyconsume 16 cycles such that 32 total cycles are consumed by the improvedapproach. For ease of discussion this aspect is disregarded in thediscussions that follow. That is, “8 cycles” means an amount ofinformation equal to 8 W is transferred, where W is the bus width,irrespective of how many cycles are actually consumed.

Even more generally, the different transfers observed in FIG. 2 can becharacterized as “full burst” (16 cycles as depicted) and “half burst”(8 cycles as depicted). Alternate embodiments could possess differentnumbers of cycles and/or amounts of data per full burst and per halfburst. For ease of discussion the remainder of the description willrefer to 16 cycles and 8 cycles. However, the reader should recognizethat such transfers can be more generally described as “full burst” and“half burst” respectively. As can been seen in FIG. 2, the data and ECCblocks of the two different protected sub words 211, 212 are interleavedacross the nine memory chips and 24 cycles to form a single unit oftransfer of 64B of data. A number of alternate embodiments can assigndata or ECC to different blocks than those depicted in FIG. 2 yet stillform two protected sub words that protect 32B of data with 128 b of ECC.

A condition of the re-striping pattern, however, is that for anyparticular sub protected word, each of the eight memory chips only storedata or ECC for that protected sub word. Notably, the condition is metfor both protected sub words even though some memory chips store datafor one of the protected sub words and ECC for the other of theprotected sub words (the two leftmost chips and two rightmost chips).

The condition can be viewed as a translation into additional “effective”memory chips that, in terms of ECC coverage, effectively form aconcurrent pair of 4+2 schemes. That is, although an additional 8 cyclesare consumed to fully form both protected sub words 211, 212, onceformed, the resultant is a 48 bit wide data structure having bothprotected sub words. Here, the 48 bit wide data structure corresponds toa total of twelve effective memory chips (12×4=48) such as a pair ofconcurrently operating 4+2 configurations.

Although the above discussion of the improved approach of FIG. 2 hasemphasized keeping all re-striped information on the same channel 101,in theory, the 8 cycle burst can occur on a different memory channelthan the memory channel where the 16 cycle burst occurs. Thus, forexample, blocks D1 and D3 can be accessed during a half burst on anothermemory channel while blocks D2, D4 and both ECC blocks are beingaccessed on memory channel 101 (thus, data and/or ECC information for asame protected sub word can be provided from different interfaces (or asame interface as suggested by FIG. 2). In this case, the incorporationof the additional ECC information is a memory capacity hit rather than amemory access time hit. In still other possible cases, the half burst isnot performed simultaneously with the full burst which results in thesame memory capacity hit and an additional latency hit.

Whereas the embodiment of FIG. 2 was directed to re-striping in responseto a failing memory chip on a memory system that transfers data in unitsof 64B, by contrast, the embodiment of FIG. 3 is directed to are-striping of a memory system that transfers data in units of 128B. Asystem that transfers data in units of 128B can, as just one example,operate like the 16+2 configuration described above with respect to FIG.1b , but where the protected data word is formed as a combination of theprotected data words 107, 110 observed in FIG. 1b (all four data unitsof words 107, 110 are combined to form the data that is protected by thecombined ECC fields of words 107, 110).

The re-striping embodiment of FIG. 3, like the approach of FIG. 2,breaks the protected word of the prior configuration into two smallerprotected sub words 311, 312, where, the data unit of each protected subword is half the size of the prior configuration. Specifically, with theprior configuration having a data unit size of 128B, the protected subwords 311, 312 each have data unit sizes of 512b=64B. Each smallerprotected sub word has its respective data and ECC blocks processed inisolation of the other protected sub word. In the case of a read, if theread data from both protected sub words is valid, the respective 512bdata units from both protected sub words are combined to form a final1024b data unit.

As observed in FIG. 3, the striping approach includes two 448b datablocks D2, D4 that are dedicated to different protected sub words andthat each consume 16 cycles across seven memory chips. Another 8 cyclesare consumed transferring residue 64b data blocks D1, D3 for thedifferent protected sub words and 128b ECC blocks ECC1, ECC2 for thedifferent protected sub words. Here, a residue and ECC block pair onlyconsume 6 memory chips each.

Thus, if the same memory chips used to transfer one of the large datablocks (e.g., D2) are also used to transfer the appropriate residue datablock and ECC block pair (e.g., D3 and ECC2), the entire approach onlyconsumes 14 memory chips (seven chips per transfer of a single largedata block, residue block and ECC block). Thus, assuming the prior 16+2configuration consumed 18 memory chips, the re-striping approach can beused to manage the failure of 4 memory chips from the 16+2configuration.

Like the approach of FIG. 2, the approach of FIG. 3 confines the blastradius to the channel/chips used to implement the prior 16+2configuration (in the case of FIG. 3 the re-striping can be in responseto failure of a memory chip of a memory channel that nominally transfersdata in 128B units). That is, for example, the only memory chips thatare re-striped are memory chips that were components of the prior 16+2configuration. Moreover, again like the approach of FIG. 2, there-striping can increase the ratio of ECC to data from the priorconfiguration (e.g., from 1:8 in the 16+2 configuration of FIG. 1b to1:4 in the approach of FIG. 3).

Again, for each protected sub word the constraint of storing ECCinformation on different memory chips than the memory chips used tostore data is honored (even though ECC and data for different protectedsub words are kept on same memory chips). Here, as explained immediatelybelow, the striping of FIG. 3 effectively implements a separate 8+2scheme for each protected sub word which, in turn, allows errors to becorrected if another (e.g., fifth) memory chip fails.

As observed in FIG. 3, the residual data blocks D2, D3 and ECC blocksECC1, ECC2 are re-shaped as drawn within the protected sub words 311,312 to match the vertical height of the larger data blocks D2, D4. Here,generally, the structure of a protected word is defined and/orunderstood with the ECC information appended to the data block, andwhere, the ECC information and the data block have the same verticalheight (so doing, e.g., defines the internal matrix computations used tocreate the ECC information from the data during a write operation, and,process the data and ECC information during a read operation).

With the residual data blocks D2, D3 and ECC blocks ECC1, ECC2 beingre-shaped to expand from 8 cycles to 16 cycles in the verticaldirection, their respective widths along the horizontal axis are reducedby half to keep their respective areas constant (keeping the areasconstant is necessary to maintain the same ECC to data ratio within theprotected sub word definition). The reduction by half along thehorizontal axis translates into a failing memory chip introducing lesserrors into the protected word which contributes to the protected word'sresilience against a chip failure.

For example, as drawn in protected sub words 311, 312 of FIG. 3, theresidual data blocks D1 and D3 consume only one memory lane each. Inreality, however, as depicted in the bus transfer diagrams, two memorychips are used to store the data of a single residual block D1. Thus, ifone of these memory chips fail, the number of induced errors in theresidual data block will be half of what it would have been if all ofthe residual data block's information were kept in a single memory assuggested by the protected sub word diagrams 311, 312. A similarsituation exists with respect to the ECC blocks ECC1, ECC2.

Regardless, the structure of the protected sub words 311, 312 of FIG. 3indicates that, like the approach of FIG. 2, a pair of concurrent 8+2configurations have effectively been implemented. That is, althoughimplemented with as little as 14 memory chips, the re-striping provideserror protection as if 20 memory chips are being used.

Similar to the embodiment of FIG. 2, the 8 cycle transfers of theresidual data blocks D1, D3 and the ECC blocks ECC1, ECC2 in FIG. 3 canbe performed with physical memory chips that are different than thephysical memory chips used to store the larger data blocks D2 and D4. Inthis case, rather than the re-striping causing a memory access time hit(the 8 cycles and 16 cycles are performed sequentially because they usethe same memory chips), the re-striping causes a memory capacity hit(the 8 cycles are performed simultaneously with the 16 cycles but usedifferent memory chips). In theory, if different memory chips are usedto store the residual data blocks and ECC blocks than the larger datablocks, the 8 cycles can, but are not required to, be performedsimultaneously with the 16 cycles.

FIG. 4 shows another striping embodiment for, e.g., re-striping from aprior 16+2 configuration. Like the previous embodiments of FIGS. 2 and3, a pair of protected sub words 411, 412 whose data units D2, D4 arehalf the size (512b=64B) of the prior configuration's data unit (128B)are created. During a read operation the respective data and ECCinformation of each protected sub word are processed in isolation fromthe other protected sub word. If the data units from both protected subwords are valid, the pair of data units from both protected sub wordsare combined to yield a final read data unit of 128B.

In FIG. 4, a Dual in-line Memory Module (DIMM) is used that has thecapability to have only half the chips of a single rank be written to.For example, the DIMM may have additional logic to process a chip enablesignal such that only half the chips of a rank receive a chip enable fora specified read or write (e.g., a “half width” read or write commandexists that, when sent to the DIMM, causes the DIMM to activate the chipenable signal for only half of the memory chips of the rank that istargeted by the command).

From FIG. 4, eight memory chips and sixteen cycles are consumed totransfer a 512b data unit for one of the protected sub words. Eightcycles and four memory chips are consumed to transfer a 128b ECC dataword for one of the protected sub words. As with the previousembodiments, for a same protected sub word, ECC information is kept indifferent memory chips than data.

When the ECC information is reshaped to match the cycle height of a dataunit, its memory width is cut by half. The resulting protected sub words411, 412, like the approach of FIG. 3, yield an effective pair ofconcurrent 8+2 configurations (one effective 8+2 configuration for eachprotected sub word). As such, there exists a 1:4 ratio of ECCinformation to data within each protected sub word.

As with prior embodiments, the same memory chips can be used to storedata and ECC from for different protected sub words in which case ECCtransfers are done sequentially with data transfers. Alternatively,different physical memory chips can be used in which case more memory isconsumed but concurrent/parallel transfers are possible.

As observed in FIG. 4, the 8 cycle transfer utilizes half the chips ofthe 16 cycle transfer. In the most efficient embodiments, but not theonly possible embodiment, the unused chips of the 8 cycle transfercontain information for a next, consecutive data unit and correspondingprotected sub words to accessed to/from the memory chips. That is, theECC1 and ECC2 fields for the protected sub words of another 128B dataunit (with different base address than the 128B data unit observed inFIG. 4) can be transmitted in the unused portions of the 8 cycletransfer observed in FIG. 4. As such, there can be interleaving ofconsecutively accessed data units. For example, a first 16 cycle bursttransfers D1 and D2 of the protected sub words for a first 128B dataunit, a following second 8 cycle burst transfers ECC1 and ECC2 for theprotected sub words for the first 128B data unit and a second, following128B unit, and, a next following 16 burst transfers D1 and D2 for thesecond, following 128B unit. Thus, all four protected sub words for apair of 128B data units are transferred over a full, half, full burstpattern.

FIG. 5 shows a memory system implementation including a memorycontroller 501, multiple memory channels 502_1 through 502_N (one ormore of which may be broken down into sub-channels) and respectivememory modules 503 that are connected to the memory channels 502. Thememory modules may be dual in-line memory modules (DIMMs), stackedmemory chip memory modules, or other types of memory modules. The memorycontroller 501 includes re-striping logic circuitry 504 that is able toimplement any/all of the aforementioned re-striping schemes describedabove.

The re-striping logic circuitry 504 therefore could be designed to,e.g., during a write operation, parse a received unit of write data intomultiple smaller data units where each smaller data unit forms the datacomponent of a protected sub word, calculating ECC information for eachof the protected sub words from their respective smaller data units andthen writing the protected sub words into the appropriate number ofmemory chips according to the striping pattern. For implementationswhere the original received unit of write data is larger than 64B (e.g.,128B as per the discussions of FIGS. 3 and 4), the appropriate memorychips could (but are not strictly required to in various memory systemarchitectures) span more than one memory module and/or memory channel.

Likewise, during a write operation, the re-striping logic circuitry 504could be designed to read the protected sub words from the appropriatememory chips in accordance with the striping pattern, perform errorcorrection calculations on each protected sub word separately based onits smaller data unit and corresponding ECC information, and form aresponsive full size read word by combining the smaller data units fromthe protected sub words if they are valid.

Additionally, some or all of the memory modules 503 may have logiccircuitry to support special operations to implement the re-striping,such as, a memory module that supports a command that writes to and/orreads from less than all (e.g., half) of the memory chips of aparticular rank that is targeted by the command. The memory chips of thememory modules 503 can be dynamic random access memory (DRAM), byteaddressable write-in-place non-volatile memory (e.g., a threedimensional cross-point architecture memory having stacks ofnon-volatile resistive storage cells constructed above the semiconductorchip substrate, such as Optane™ memory from Intel Corporation), or acombination of DRAM and byte addressable write-in-place non-volatilememory.

It is pertinent to recognize that other embodiments not specificallydescribed above are nevertheless taught by the teachings provided above.For example, other embodiments may divide the data unit into foursections to create four protected sub-words, divide the data unit intoeight sections to create four protected sub-words, etc. Some of theseembodiments may map directly into the striping patterns described abovewhile others may exhibit their own striping pattern that, e.g.,increases the ratio of ECC protection from pre-chip failure to post-chipfailure, reduces the number of memory chips that are used from pre-chipfailure to post-chip failure, confines the memory chips that areaffected by the re-striping to the memory chips that existed on thememory channel or memory sub-channel that suffered the memory chipfailure, constrains the striping pattern so that data and ECC are ondifferent memory chips for any particular protected sub word even thoughat least one memory chip stores data and ECC of different protected subwords, etc.

It is also pertinent to recognize that when in the above description, abad chip need not be a full bad chip but could just be a chip with a badarea of memory. The scheme described above would then just be applied tothe bad areas of memory. The bad areas of memory could be identified byregisters in the memory channel controller.

Additional possible characteristics include a single memory chip thatcontains both ECC information and data for the same protected sub word,but where, the ECC and data are stored in different “failure regions” ofthe particular memory chip. Here, a single memory chip is understood tohave different failure regions, where different bits that are stored bythe memory chip for a particular protected sub word are associated withdifferent failure regions, and/or, one or more bits that are stored bythe memory chip for the sub word are associated with the same failureregion. For example, if a “failure region” is associated with aparticular wire within the memory chip that is replicated in the memoryarray, some (first) bits of a same protected sub word may be transportedwith a same such wire, while other (second) bits of the same protectedsub word may be transported with another instance of the wire. Here, thefirst bits are associated with a first failure region, while the secondbits are associated with a second failure region. As such, the firstbits may be used to store data or ECC of a protected sub word while thesecond bits may be used to store the other of data/ECC of the protectedsub word. Bad failure regions can be paired with other bad failureregions, or good failure regions. For example, in a ×8 DRAM, one mightbe able to assume four I/O's correspond to one failure region andanother four I/O's correspond to another failure region. Hence, from anECC perspective this ×8 DRAM can be treated as two independent ×4 DRAMS,and a forty-bit 8+2 ECC scheme can be implemented with five ×8 memorychips.

In various embodiments, any spare memory chips that remain afterre-striping can be used to store even more ECC information. For example,if the original 8+2 configuration of FIG. 1a is re-striped to theapproach of FIG. 2, the number of used chips changes from ten totalmemory chips in the configuration of FIG. 1a to eight used chips in theconfiguration of FIG. 2. With one of the original ten chips being deemed“bad”, there is one “spare” chip remaining after re-striping to theapproach of FIG. 2. If desired, the spare chip can be used in there-striping of FIG. 2 to contain ECC information thereby increasing theECC coverage per sub-word even further.

It is pertinent to recognize the D1, D2, etc. and ECC1 and ECC2 blockarrangements observed in FIGS. 2, 3 and 4 are only exemplary. Generallyspeaking, the content of the different blocks can be allocated in manyother patterns (e.g., so long as same memory chip failure regions areassigned data or ECC but not both, for a same protected sub word). Forexample, referring to FIG. 2, portions of ECC1 and ECC2 can be swappedsuch that a same memory chip stores ECC information for differentprotected sub words created from a same data unit.

Although embodiments above have increased ECC coverage afterre-striping, note that in at least some respects actual ECC writingactivity is reduced. For example, if the case of a prior (pre-failure)16+2 configuration to a re-striping (post failure) approach of FIG. 3 or4, ECC writing activity reduces from 16 cycles to 8 cycles(alternatively, fewer ECC chips can be written to over a longer numberof cycles).

Although embodiments above have emphasized full size bursts and halfsize bursts, other implementation can use other combinations such as,e.g., a full size burst and a partial bursts that are other than “half”bursts (e.g., a number of substantive cycles that are other than halfthe number of substantive cycles as a full burst, and/or, a number ofchips and/or memory chip I/Os that are other than half of the full widthof chips and/or I/Os).

FIG. 6 depicts an example system. The system can use the teachingsprovided herein. System 600 includes processor 610, which providesprocessing, operation management, and execution of instructions forsystem 600. Processor 610 can include any type of microprocessor,central processing unit (CPU), graphics processing unit (GPU),processing core, or other processing hardware to provide processing forsystem 600, or a combination of processors. Processor 610 controls theoverall operation of system 600, and can be or include, one or moreprogrammable general-purpose or special-purpose microprocessors, digitalsignal processors (DSPs), programmable controllers, application specificintegrated circuits (ASICs), programmable logic devices (PLDs), or thelike, or a combination of such devices.

In one example, system 600 includes interface 612 coupled to processor610, which can represent a higher speed interface or a high throughputinterface for system components that needs higher bandwidth connections,such as memory subsystem 620 or graphics interface components 640, oraccelerators 642. Interface 612 represents an interface circuit, whichcan be a standalone component or integrated onto a processor die. Wherepresent, graphics interface 640 interfaces to graphics components forproviding a visual display to a user of system 600. In one example,graphics interface 640 can drive a high definition (HD) display thatprovides an output to a user. High definition can refer to a displayhaving a pixel density of approximately 100 PPI (pixels per inch) orgreater and can include formats such as full HD (e.g., 1080p), retinadisplays, 4K (ultra-high definition or UHD), or others. In one example,the display can include a touchscreen display. In one example, graphicsinterface 640 generates a display based on data stored in memory 630 orbased on operations executed by processor 610 or both. In one example,graphics interface 640 generates a display based on data stored inmemory 630 or based on operations executed by processor 610 or both.

Accelerators 642 can be a fixed function offload engine that can beaccessed or used by a processor 610. For example, an accelerator amongaccelerators 642 can provide compression (DC) capability, cryptographyservices such as public key encryption (PKE), cipher,hash/authentication capabilities, decryption, or other capabilities orservices. In some embodiments, in addition or alternatively, anaccelerator among accelerators 642 provides field select controllercapabilities as described herein. In some cases, accelerators 642 can beintegrated into a CPU socket (e.g., a connector to a motherboard orcircuit board that includes a CPU and provides an electrical interfacewith the CPU). For example, accelerators 642 can include a single ormulti-core processor, graphics processing unit, logical execution unitsingle or multi-level cache, functional units usable to independentlyexecute programs or threads, application specific integrated circuits(ASICs), neural network processors (NNPs), “X” processing units (XPUs),programmable control logic, and programmable processing elements such asfield programmable gate arrays (FPGAs). Accelerators 642 can providemultiple neural networks, processor cores, or graphics processing unitscan be made available for use by artificial intelligence (AI) or machinelearning (ML) models. For example, the AI model can use or include anyor a combination of: a reinforcement learning scheme, Q-learning scheme,deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C),combinatorial neural network, recurrent combinatorial neural network, orother AI or ML model. Multiple neural networks, processor cores, orgraphics processing units can be made available for use by AI or MLmodels. Any of the accelerators mentioned above or other acceleratorsmay use a memory system (e.g., a local memory system of the accelerator,a main memory system of a computer, etc.) that implements one or morememory chip striping improvements in response to a chip failure asdescribed above.

Memory subsystem 620 represents the main memory of system 600 andprovides storage for code to be executed by processor 610, or datavalues to be used in executing a routine. Memory subsystem 620 caninclude one or more memory devices 630, volatile memory, or acombination of such devices. The memory subsystem 620, in variousembodiments, is designed to implement one or more memory chip stripingimprovements in response to a chip failure as described above.

Memory 630 stores and hosts, among other things, operating system (OS)632 to provide a software platform for execution of instructions insystem 600. Additionally, applications 634 can execute on the softwareplatform of OS 632 from memory 630. Applications 634 represent programsthat have their own operational logic to perform execution of one ormore functions. Processes 636 represent agents or routines that provideauxiliary functions to OS 632 or one or more applications 634 or acombination. OS 632, applications 634, and processes 636 providesoftware logic to provide functions for system 600. In one example,memory subsystem 620 includes memory controller 622, which is a memorycontroller to generate and issue commands to memory 630. It will beunderstood that memory controller 622 could be a physical part ofprocessor 610 or a physical part of interface 612. For example, memorycontroller 622 can be an integrated memory controller, integrated onto acircuit with processor 610. In some examples, a system on chip (SOC orSoC) combines into one SoC package one or more of: processors, graphics,memory, memory controller, and Input/Output (I/O) control logic.

A volatile memory is memory whose state (and therefore the data storedin it) is indeterminate if power is interrupted to the device. Dynamicvolatile memory requires refreshing the data stored in the device tomaintain state. One example of dynamic volatile memory includes DRAM(Dynamic Random Access Memory), or some variant such as Synchronous DRAM(SDRAM). A memory subsystem as described herein may be compatible with anumber of memory technologies, such as DDR3 (Double Data Rate version 3,original release by JEDEC (Joint Electronic Device Engineering Council)on Jun. 27, 2007). DDR4 (DDR version 4, initial specification publishedin September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low PowerDDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4) LPDDR version4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (WideInput/Output version 2, JESD229-2 originally published by JEDEC inAugust 2014, HBM (High Bandwidth Memory, JESD325, originally publishedby JEDEC in October 2013, LPDDR5 (Low Power DDR 5, JESD209-5, originallypublished by JEDEC in February 2019), DDR5 (DDR version 5, JESD79-5,originally published by JEDEC in July 2020), HBM2 (HBM version 2),currently in discussion by JEDEC, or others or combinations of memorytechnologies, and technologies based on derivatives or extensions ofsuch specifications. The JEDEC standards are available at www.jedec.org.

While not specifically illustrated, it will be understood that system600 can include one or more buses or bus systems between devices, suchas a memory bus, a graphics bus, interface buses, or others. Buses orother signal lines can communicatively or electrically couple componentstogether, or both communicatively and electrically couple thecomponents. Buses can include physical communication lines,point-to-point connections, bridges, adapters, controllers, or othercircuitry or a combination. Buses can include, for example, one or moreof a system bus, a Peripheral Component Interconnect express (PCIe) bus,a HyperTransport or industry standard architecture (ISA) bus, a smallcomputer system interface (SCSI) bus, Remote Direct Memory Access(RDMA), Internet Small Computer Systems Interface (iSCSI), NVM express(NVMe), Coherent Accelerator Interface (CXL), Coherent AcceleratorProcessor Interface (CAPI), a universal serial bus (USB), or anInstitute of Electrical and Electronics Engineers (IEEE) standard 1394bus.

In one example, system 600 includes interface 614, which can be coupledto interface 612. In one example, interface 614 represents an interfacecircuit, which can include standalone components and integratedcircuitry. In one example, multiple user interface components orperipheral components, or both, couple to interface 614. Networkinterface 650 provides system 600 the ability to communicate with remotedevices (e.g., servers or other computing devices) over one or morenetworks. Network interface 650 can include an Ethernet adapter,wireless interconnection components, cellular network interconnectioncomponents, USB (universal serial bus), or other wired or wirelessstandards-based or proprietary interfaces. Network interface 650 cantransmit data to a remote device, which can include sending data storedin memory. Network interface 650 can receive data from a remote device,which can include storing received data into memory. Various embodimentscan be used in connection with network interface 650, processor 610, andmemory subsystem 620.

In one example, system 600 includes one or more input/output (I/O)interface(s) 660. I/O interface 660 can include one or more interfacecomponents through which a user interacts with system 600 (e.g., audio,alphanumeric, tactile/touch, or other interfacing). Peripheral interface670 can include any hardware interface not specifically mentioned above.Peripherals refer generally to devices that connect dependently tosystem 600. A dependent connection is one where system 600 provides thesoftware platform or hardware platform or both on which operationexecutes, and with which a user interacts.

In one example, system 600 includes storage subsystem 680 to store datain a nonvolatile manner. In one example, in certain systemimplementations, at least certain components of storage 680 can overlapwith components of memory subsystem 620. Storage subsystem 680 includesstorage device(s) 684, which can be or include any conventional mediumfor storing large amounts of data in a nonvolatile manner, such as oneor more magnetic, solid state drive, or optical based disks, or acombination. Storage 684 holds code or instructions and data 686 in apersistent state (e.g., the value is retained despite interruption ofpower to system 600). Storage 684 can be generically considered to be a“memory,” although memory 630 is typically the executing or operatingmemory to provide instructions to processor 610. Whereas storage 684 isnonvolatile, memory 630 can include volatile memory (e.g., the value orstate of the data is indeterminate if power is interrupted to system600). In one example, storage subsystem 680 includes controller 682 tointerface with storage 684. In one example controller 682 is a physicalpart of interface 614 or processor 610 or can include circuits or logicin both processor 610 and interface 614.

A non-volatile memory (NVM) device is a memory whose state isdeterminate even if power is interrupted to the device. In oneembodiment, the NVM device can comprise a block addressable memorydevice, such as NAND technologies, or more specifically, multi-thresholdlevel NAND flash memory (for example, Single-Level Cell (“SLC”),Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell(“TLC”), or some other NAND). A NVM device can also comprise abyte-addressable write-in-place three dimensional cross point memorydevice, or other byte addressable write-in-place NVM device (alsoreferred to as persistent memory), such as single or multi-level PhaseChange Memory (PCM) or phase change memory with a switch (PCMS), NVMdevices that use chalcogenide phase change material (for example,chalcogenide glass), resistive memory including metal oxide base, oxygenvacancy base and Conductive Bridge Random Access Memory (CB-RAM),nanowire memory, ferroelectric random access memory (FeRAM, FRAM),magneto resistive random access memory (MRAM) that incorporatesmemristor technology, spin transfer torque (STT)-MRAM, a spintronicmagnetic junction memory based device, a magnetic tunneling junction(MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer)based device, a thyristor based memory device, or a combination of anyof the above, or other memory.

A power source (not depicted) provides power to the components of system600. More specifically, power source typically interfaces to one ormultiple power supplies in system 600 to provide power to the componentsof system 600. In one example, the power supply includes an AC to DC(alternating current to direct current) adapter to plug into a walloutlet. Such AC power can be renewable energy (e.g., solar power) powersource. In one example, power source includes a DC power source, such asan external AC to DC converter. In one example, power source or powersupply includes wireless charging hardware to charge via proximity to acharging field. In one example, power source can include an internalbattery, alternating current supply, motion-based power supply, solarpower supply, or fuel cell source.

In an example, system 600 can be implemented as a disaggregatedcomputing system. For example, the system 600 can be implemented withinterconnected compute sleds of processors, memories, storages, networkinterfaces, and other components. High speed interconnects can be usedsuch as PCIe, Ethernet, or optical interconnects (or a combinationthereof). For example, the sleds can be designed according to anyspecifications promulgated by the Open Compute Project (OCP) or otherdisaggregated computing effort, which strives to modularize mainarchitectural computer components into rack-pluggable components (e.g.,a rack pluggable processing component, a rack pluggable memorycomponent, a rack pluggable storage component, a rack pluggableaccelerator component, etc.).

FIG. 7 depicts an example of a data center. Various of the abovedescribed re-striping embodiments can be used in or with the data centerof FIG. 7. As shown in FIG. 7, data center 700 may include an opticalfabric 712. Optical fabric 712 may generally include a combination ofoptical signaling media (such as optical cabling) and optical switchinginfrastructure via which any particular sled in data center 700 can sendsignals to (and receive signals from) the other sleds in data center700. However, optical, wireless, and/or electrical signals can betransmitted using fabric 712. The signaling connectivity that opticalfabric 712 provides to any given sled may include connectivity both toother sleds in a same rack and sleds in other racks. Data center 700includes four racks 702A to 702D and racks 702A to 702D house respectivepairs of sleds 704A-1 and 704A-2, 704B-1 and 704B-2, 704C-1 and 704C-2,and 704D-1 and 704D-2. Thus, in this example, data center 700 includes atotal of eight sleds. Optical fabric 712 can provide sled signalingconnectivity with one or more of the seven other sleds. For example, viaoptical fabric 712, sled 704A-1 in rack 702A may possess signalingconnectivity with sled 704A-2 in rack 702A, as well as the six othersleds 704B-1, 704B-2, 704C-1, 704C-2, 704D-1, and 704D-2 that aredistributed among the other racks 702B, 702C, and 702D of data center700. The embodiments are not limited to this example. For example,fabric 712 can provide optical and/or electrical signaling.

FIG. 8 depicts an environment 800 includes multiple computing racks 802,each including a Top of Rack (ToR) switch 804, a pod manager 806, and aplurality of pooled system drawers. Various equipment within the rackmay have memory that is implemented with one or more stripingimprovements as discussed above. Generally, the pooled system drawersmay include pooled compute drawers and pooled storage drawers to, e.g.,effect a disaggregated computing system. Optionally, the pooled systemdrawers may also include pooled memory drawers and pooled Input/Output(I/O) drawers. In the illustrated embodiment the pooled system drawersinclude an INTEL® XEON® pooled computer drawer 808, and INTEL® ATOM™pooled compute drawer 210, a pooled storage drawer 212, a pooled memorydrawer 214, and an pooled I/O drawer 816. Each of the pooled systemdrawers is connected to ToR switch 804 via a high-speed link 818, suchas a 40 Gigabit/second (Gb/s) or 100 Gb/s Ethernet link or an 100+Gb/sSilicon Photonics (SiPh) optical link. In one embodiment high-speed link818 comprises an 800 Gb/s SiPh optical link.

Again, the drawers can be designed according to any specificationspromulgated by the Open Compute Project (OCP) or other disaggregatedcomputing effort, which strives to modularize main architecturalcomputer components into rack-pluggable components (e.g., a rackpluggable processing component, a rack pluggable memory component, arack pluggable storage component, a rack pluggable acceleratorcomponent, etc.).

Multiple of the computing racks 800 may be interconnected via their ToRswitches 804 (e.g., to a pod-level switch or data center switch), asillustrated by connections to a network 820. In some embodiments, groupsof computing racks 802 are managed as separate pods via pod manager(s)806. In one embodiment, a single pod manager is used to manage all ofthe racks in the pod. Alternatively, distributed pod managers may beused for pod management operations.

Multiple rack environment 800 further includes a management interface822 that is used to manage various aspects of the RSD environment. Thisincludes managing rack configuration, with corresponding parametersstored as rack configuration data 824.

Embodiments herein may be implemented in various types of computing,smart phones, tablets, personal computers, and networking equipment,such as switches, routers, racks, and blade servers such as thoseemployed in a data center and/or server farm environment. The serversused in data centers and server farms comprise arrayed serverconfigurations such as rack-based servers or blade servers. Theseservers are interconnected in communication via various networkprovisions, such as partitioning sets of servers into Local AreaNetworks (LANs) with appropriate switching and routing facilitiesbetween the LANs to form a private Intranet. For example, cloud hostingfacilities may typically employ large data centers with a multitude ofservers. A blade comprises a separate computing platform that isconfigured to perform server-type functions, that is, a “server on acard.” Accordingly, each blade includes components common toconventional servers, including a main printed circuit board (mainboard) providing internal wiring (e.g., buses) for coupling appropriateintegrated circuits (ICs) and other components mounted to the board.

Various examples may be implemented using hardware elements, softwareelements, or a combination of both. In some examples, hardware elementsmay include devices, components, processors, microprocessors, circuits,circuit elements (e.g., transistors, resistors, capacitors, inductors,and so forth), integrated circuits, ASICs, PLDs, DSPs, FPGAs, memoryunits, logic gates, registers, semiconductor device, chips, microchips,chip sets, and so forth. In some examples, software elements may includesoftware components, programs, applications, computer programs,application programs, system programs, machine programs, operatingsystem software, middleware, firmware, software modules, routines,subroutines, functions, methods, procedures, software interfaces, APIs,instruction sets, computing code, computer code, code segments, computercode segments, words, values, symbols, or any combination thereof.Determining whether an example is implemented using hardware elementsand/or software elements may vary in accordance with any number offactors, such as desired computational rate, power levels, heattolerances, processing cycle budget, input data rates, output datarates, memory resources, data bus speeds and other design or performanceconstraints, as desired for a given implementation. It is noted thathardware, firmware and/or software elements may be collectively orindividually referred to herein as “module,” “logic,” “circuit,” or“circuitry.”

Some examples may be implemented using or as an article of manufactureor at least one computer-readable medium. A computer-readable medium mayinclude a non-transitory storage medium to store logic. In someexamples, the non-transitory storage medium may include one or moretypes of computer-readable storage media capable of storing electronicdata, including volatile memory or non-volatile memory, removable ornon-removable memory, erasable or non-erasable memory, writeable orre-writeable memory, and so forth. In some examples, the logic mayinclude various software elements, such as software components,programs, applications, computer programs, application programs, systemprograms, machine programs, operating system software, middleware,firmware, software modules, routines, subroutines, functions, methods,procedures, software interfaces, API, instruction sets, computing code,computer code, code segments, computer code segments, words, values,symbols, or any combination thereof.

According to some examples, a computer-readable medium may include anon-transitory storage medium to store or maintain instructions thatwhen executed by a machine, computing device or system, cause themachine, computing device or system to perform methods and/or operationsin accordance with the described examples. The instructions may includeany suitable type of code, such as source code, compiled code,interpreted code, executable code, static code, dynamic code, and thelike. The instructions may be implemented according to a predefinedcomputer language, manner or syntax, for instructing a machine,computing device or system to perform a certain function. Theinstructions may be implemented using any suitable high-level,low-level, object-oriented, visual, compiled and/or interpretedprogramming language.

One or more aspects of at least one example may be implemented byrepresentative instructions stored on at least one machine-readablemedium which represents various logic within the processor, which whenread by a machine, computing device or system causes the machine,computing device or system to fabricate logic to perform the techniquesdescribed herein. Such representations, known as “IP cores” may bestored on a tangible, machine readable medium and supplied to variouscustomers or manufacturing facilities to load into the fabricationmachines that actually make the logic or processor.

The appearances of the phrase “one example” or “an example” are notnecessarily all referring to the same example or embodiment. Any aspectdescribed herein can be combined with any other aspect or similar aspectdescribed herein, regardless of whether the aspects are described withrespect to the same figure or element. Division, omission or inclusionof block functions depicted in the accompanying figures does not inferthat the hardware components, circuits, software and/or elements forimplementing these functions would necessarily be divided, omitted, orincluded in embodiments.

Some examples may be described using the expression “coupled” and“connected” along with their derivatives. These terms are notnecessarily intended as synonyms for each other. For example,descriptions using the terms “connected” and/or “coupled” may indicatethat two or more elements are in direct physical or electrical contactwith each other. The term “coupled,” however, may also mean that two ormore elements are not in direct contact with each other, but yet stillco-operate or interact with each other.

The terms “first,” “second,” and the like, herein do not denote anyorder, quantity, or importance, but rather are used to distinguish oneelement from another. The terms “a” and “an” herein do not denote alimitation of quantity, but rather denote the presence of at least oneof the referenced items. The term “asserted” used herein with referenceto a signal denote a state of the signal, in which the signal is active,and which can be achieved by applying any logic level either logic 0 orlogic 1 to the signal. The terms “follow” or “after” can refer toimmediately following or following after some other event or events.Other sequences of steps may also be performed according to alternativeembodiments. Furthermore, additional steps may be added or removeddepending on the particular applications. Any combination of changes canbe used and one of ordinary skill in the art with the benefit of thisdisclosure would understand the many variations, modifications, andalternative embodiments thereof.

Disjunctive language such as the phrase “at least one of X, Y, or Z,”unless specifically stated otherwise, is otherwise understood within thecontext as used in general to present that an item, term, etc., may beeither X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z).Thus, such disjunctive language is not generally intended to, and shouldnot, imply that certain embodiments require at least one of X, at leastone of Y, or at least one of Z to each be present. Additionally,conjunctive language such as the phrase “at least one of X, Y, and Z,”unless specifically stated otherwise, should also be understood to meanX, Y, Z, or any combination thereof, including “X, Y, and/or Z.”

1. An apparatus, comprising: a memory controller comprising logiccircuitry to write a unit of write data into a plurality of memory chipsaccording to a striping pattern that comprises multiple protected subwords, each protected sub word comprising a smaller portion of the unitof write data and error correction code (ECC) information calculatedfrom the smaller portion of the unit of write data.
 2. The apparatus ofclaim 1 wherein the striping pattern further comprises first informationof more than one of the protected sub words during a full burst, and,second information of the more than one of the protected sub wordsduring a partial burst.
 3. The apparatus of claim 1 wherein the memorycontroller is to implement the striping pattern in response to a memorychip failure.
 4. The apparatus of claim 3 wherein the striping patternconsumes less memory chips to store a unit of received write data thanwere used prior to the memory chip failure.
 5. The apparatus of claim 3wherein the striping pattern uses a higher error correction coding (ECC)to data ratio than the ECC to data ratio that was used prior to thememory chip failure.
 6. The apparatus of claim 3 wherein the pluralityof memory chips are components of a memory channel that suffered thememory chip failure.
 7. The apparatus of claim 1 wherein the logiccircuitry is further to: process the respective data and errorcorrection coding (ECC) of the multiple protected sub wordsindependently; and, if the respective data is valid, combine therespective data of the multiple protected sub words to form a unit ofread data.
 8. The apparatus of claim 1 wherein the striping patternforms two protected sub words, where, the two protected sub words havedifferent halves of the unit of write data.
 9. The apparatus of claim 1wherein the unit of write data is 64 bytes.
 10. The apparatus of claim 1wherein the unit of write data is 128 bytes.
 11. The apparatus of claim1 wherein data and error correction coding (ECC) from a same protectedsub word are not stored in a same failure region of a same one of theplurality of memory chips.
 12. The apparatus of claim 1 wherein at leastone of the plurality of memory chips stores data and error correctioncoding (ECC) for different ones of the protected sub words.
 13. Acomputing system, comprising: a processor; memory that the processor isto access; and, a memory controller coupled between the processor andthe memory, the memory controller comprising logic circuitry to write aunit of write data into a plurality of memory chips of the memoryaccording to a striping pattern that comprises multiple protected subwords, each protected sub word comprising a smaller portion of the unitof write data and error correction coding (ECC) information calculatedfrom the smaller portion of the unit of write data.
 14. The computingsystem of claim 13 wherein the striping pattern further comprises firstinformation of more than one of the protected sub words during a fullburst, and, second information of the more than one of the protected subwords during a half burst.
 15. The computing system of claim 13 whereinthe memory controller is to implement the striping pattern in responseto a memory chip failure.
 16. The computing system of claim 15 whereinthe striping pattern consumes less memory chips to store a unit ofreceived write data than were used prior to the memory chip failure. 17.The computing system of claim 15 wherein the striping pattern uses ahigher error correction coding (ECC) to data ratio than the ECC to dataratio that was used prior to the memory chip failure.
 18. The computingsystem of claim 15 wherein the plurality of memory chips are componentsof a memory channel that suffered the memory chip failure.
 19. Thecomputing system of claim 13 wherein the logic circuitry is further to:process the respective data and error correction coding (ECC) of themultiple protected sub words independently; and, if the respective datais valid, combine the respective data of the multiple protected subwords to form a unit of read data.
 20. The computing system of claim 13wherein the striping pattern forms two protected sub words, where, thetwo protected sub words have different halves of the unit of write data.21. The computing system of claim 13 wherein the plurality of memorychips are on a memory module that allows less than all the memory chipsof a rank to be accessed.
 22. A method, comprising: in response to amemory chip failure, applying a new write striping pattern to aplurality of memory chips, wherein, the new write striping patterncomprises multiple protected sub words, each protected sub wordcomprising a smaller portion of a unit of write data and errorcorrection coding (ECC) information calculated from the smaller portionof the unit of write data.